Agricultural Knowledge Discovery from Semi-Structured Text

نویسندگان

  • C. Pechsiri
  • A. Kawtrakul
چکیده

This research aims to develop automatic knowledge discovery system from semi-structured Thai text for supporting plant diagnosis. Plant disease diagnosis is very important for farmers to be able to cure infected plants before infections become more severe. Prior to diagnosis, farmers need to gain knowledge retrieved primarily from text, including unstructured and semi-structured document. As this knowledge is spread throughout the text, collecting the required knowledge in its entirety is time consuming. An alternative to the manual approach is the use of automatic knowledge discovery processes to acquire concise knowledge for plant disease diagnosis. Then the knowledge discovery process consists of at least two main steps: knowledge extraction and knowledge generalization. However, there are two major problems in this research. First is the knowledge extraction problem attributed to linguistics, which can be solved by NLP technique such as zero anaphora, ellipsis, etc. And second is the generalization problem due to obtaining general knowledge that is intrinsically uncertain and incomplete. To solve these problems we propose three combination techniques: First, a template-matching rule is used to extract the knowledge from the agricultural document on website. Second, a Monte Carlo simulation technique is applied to solve the incomplete knowledge of plant disease symptoms from the texts. And the third one is the use of the fuzzy concept to determine the weighted average of the generality of the symptom from each pathogen type or insect type. The results of knowledge generalization will then be evaluated by experts, and knowledge extraction will be evaluated in term of precision, and recall. It is important to note that this is being conducted in part of ongoing research. Keyword: knowledge extraction, knowledge generalization, knowledge discovery, fuzzy, Monte Carlo simulation, template matching rule 1 C. Pechsiri, A. Kongwan and A. Kawtrakul, The Specialty Research Unit of Natural Language Processing and Intelligent Information System Technology, Department of Computer Engineering, Kasetsart University, Bangkok, Thailand, [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Web Text Mining Flexible Architecture

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge minin...

متن کامل

From Faceted Classification to Knowledge Discovery of Semi-structured Text Records

The maintenance and service records collected and maintained by the aerospace companies are a useful resource to the in-service engineers in providing their ongoing support of their aircrafts. Such records are typically semi-structured and contain useful information such as a description of the issue and references to correspondences and documentation generated during its resolution. The inform...

متن کامل

Efficient Text and Semi-structured Data Mining: Knowledge Discovery in the Cyberspace

This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and unordered tree patterns modeling unstructured texts and semi-structured data on the Web. Then, we consider the problem of finding the patterns that opti...

متن کامل

A Detailed Study on Text Mining Techniques

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from not-structured or semi-structured data. This aspect is fundamental because most of the Web information is semistructured due to the nested structure of HTML code, is linked and is redundant. Web Text Mining helps whole knowledge mining process in mining, extraction and integration of u...

متن کامل

A Study of Text Mining Methods, Applications,and Techniques

Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005